
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="X-UA-Compatible" content="IE=Edge" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>matminer (Materials Data Mining) &#8212; matminer 0.5.4 documentation</title>
    <link rel="stylesheet" href="_static/nature.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
 
<link href='https://fonts.googleapis.com/css?family=Lato:400,700' rel='stylesheet' type='text/css'>

  </head><body>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="nav-item nav-item-0"><a href="#">matminer 0.5.4 documentation</a> &#187;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <img alt="matminer logo" src="_images/matminer_logo_small.png" />
<div class="section" id="matminer">
<h1>matminer<a class="headerlink" href="#matminer" title="Permalink to this headline">¶</a></h1>
<p>matminer is a Python library for data mining the properties of materials. It contains routines for obtaining data on materials properties from various databases, featurizing complex materials attributes (e.g., composition, crystal structure, band structure) into physically-relevant numerical quantities, and analyzing the results of data mining.</p>
<p>matminer works with the <a class="reference external" href="https://pandas.pydata.org">pandas</a> data format in order to make various downstream machine learning libraries and tools available to materials science applications.</p>
<p>matminer is <a class="reference external" href="https://github.com/hackingmaterials/matminer">open source</a> via a BSD-style license.</p>
<div class="section" id="installing-matminer">
<h2>Installing matminer<a class="headerlink" href="#installing-matminer" title="Permalink to this headline">¶</a></h2>
<p>To install matminer, follow the short <a class="reference internal" href="installation.html"><span class="doc">installation tutorial.</span></a></p>
</div>
<div class="section" id="overview">
<h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2>
<p>Matminer makes it easy to:</p>
<ul class="simple">
<li><strong>obtain materials data from various sources</strong> into the <a class="reference external" href="https://pandas.pydata.org">pandas</a> data format. Through pandas, matminer enables professional-level data manipulation and analysis capabilities for materials data.</li>
<li><strong>transform and featurize complex materials attributes into numerical descriptors for data mining.</strong> For example, matminer can turn a composition such as “Fe3O4” into arrays of numbers representing things like average electronegativity or difference in ionic radii of the substituent elements. Matminer also contains sophisticated crystal structure and site featurizers (e.g., obtaining the coordination number or local environment of atoms in the structure) as well as featurizers for complex materials data such as band structures and density of states. All of these various featurizers are available under a consistent interface, making it easy to try different types of materials descriptors for an analysis and to transform materials science objects into physically-relevant numbers for data mining. A full <a class="reference internal" href="featurizer_summary.html"><span class="doc">Table of Featurizers</span></a> is available.</li>
<li><strong>perform data mining on materials</strong>. Although matminer itself does not contain implementations of machine learning algorithms, it makes it easy to prepare and transform data sets for use with standard data mining packages such as <a class="reference external" href="http://scikit-learn.org">scikit-learn</a>. See our examples for more details.</li>
<li><strong>generate interactive plots</strong> through an interface to the <a class="reference external" href="https://plot.ly">plotly</a> visualization package.</li>
</ul>
<p>A general workflow and overview of matminer’s capabilities is presented below:</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<a class="reference internal image-reference" href="_images/Flowchart.png"><img alt="Flow chart of matminer features" class="align-center" src="_images/Flowchart.png" style="width: 1000px;" /></a>
<div class="line-block">
<div class="line"><br /></div>
<div class="line"><br /></div>
</div>
<p>Take a tour of matminer’s features by scrolling down!</p>
</div>
<div class="section" id="data-retrieval-tools">
<h2>Data retrieval tools<a class="headerlink" href="#data-retrieval-tools" title="Permalink to this headline">¶</a></h2>
<div class="section" id="retrieve-data-from-the-biggest-materials-databases-such-as-the-materials-project-and-citrine-s-databases-in-a-pandas-dataframe-format">
<h3>Retrieve data from the biggest materials databases, such as the Materials Project and Citrine’s databases, in a Pandas dataframe format<a class="headerlink" href="#retrieve-data-from-the-biggest-materials-databases-such-as-the-materials-project-and-citrine-s-databases-in-a-pandas-dataframe-format" title="Permalink to this headline">¶</a></h3>
<p>The <a class="reference external" href="https://github.com/hackingmaterials/matminer/blob/master/matminer/data_retrieval/retrieve_MP.py">MPDataRetrieval</a> and <a class="reference external" href="https://github.com/hackingmaterials/matminer/blob/master/matminer/data_retrieval/retrieve_Citrine.py">CitrineDataRetrieval</a> classes can be used to retrieve data from the biggest open-source materials database collections of the <a class="reference external" href="https://www.materialsproject.org/">Materials Project</a> and <a class="reference external" href="https://citrination.com/">Citrine Informatics</a>, respectively, in a <a class="reference external" href="http://pandas.pydata.org/">Pandas</a> dataframe format. The data contained in these databases are a variety of material properties, obtained in-house or from other external databases, that are either calculated, measured from experiments, or learned from trained algorithms. The <code class="code docutils literal notranslate"><span class="pre">get_dataframe</span></code> method of these classes executes the data retrieval by searching the respective database using user-specified filters, such as compound/material, property type, etc , extracting the selected data in a JSON/dictionary format through the API, parsing it and output the result to a Pandas dataframe with columns as properties/features measured or calculated and rows as data points.</p>
<p>For example, to compare experimental and computed band gaps of Si, one can employ the following lines of code:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">matminer.data_retrieval.retrieve_Citrine</span> <span class="kn">import</span> <span class="n">CitrineDataRetrieval</span>
<span class="kn">from</span> <span class="nn">matminer.data_retrieval.retrieve_MP</span> <span class="kn">import</span> <span class="n">MPDataRetrieval</span>

<span class="n">df_citrine</span> <span class="o">=</span> <span class="n">CitrineDataRetrieval</span><span class="p">()</span><span class="o">.</span><span class="n">get_dataframe</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s1">&#39;Si&#39;</span><span class="p">,</span> <span class="nb">property</span><span class="o">=</span><span class="s1">&#39;band gap&#39;</span><span class="p">,</span> <span class="n">data_type</span><span class="o">=</span><span class="s1">&#39;EXPERIMENTAL&#39;</span><span class="p">)</span>
<span class="n">df_mp</span> <span class="o">=</span> <span class="n">MPDataRetrieval</span><span class="p">()</span><span class="o">.</span><span class="n">get_dataframe</span><span class="p">(</span><span class="n">criteria</span><span class="o">=</span><span class="s1">&#39;Si&#39;</span><span class="p">,</span> <span class="n">properties</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;band_gap&#39;</span><span class="p">])</span>
</pre></div>
</div>
<p><a class="reference external" href="https://github.com/hackingmaterials/matminer/blob/master/matminer/data_retrieval/retrieve_MongoDB.py">MongoDataRetrieval</a> is another data retrieval tool developed that allows for the parsing of any <a class="reference external" href="https://www.mongodb.com/">MongoDB</a> collection (which follows a flexible JSON schema), into a Pandas dataframe that has a format similar to the output dataframe from the above data retrieval tools. The arguments of the <code class="code docutils literal notranslate"><span class="pre">get_dataframe</span></code> method allow to utilize MongoDB’s rich and powerful query/aggregation syntax structure. More information on customization of queries can be found in the <a class="reference external" href="https://docs.mongodb.com/manual/">MongoDB documentation</a>.</p>
</div>
<div class="section" id="access-ready-made-datasets-for-exploratory-analysis-benchmarking-and-testing-without-ever-leaving-the-python-interpreter">
<h3>Access ready-made datasets for exploratory analysis, benchmarking, and testing without ever leaving the Python interpreter<a class="headerlink" href="#access-ready-made-datasets-for-exploratory-analysis-benchmarking-and-testing-without-ever-leaving-the-python-interpreter" title="Permalink to this headline">¶</a></h3>
<p>The datasets module provides an ever growing collection of materials science datasets that have been collected, formatted as pandas dataframes, and made available through a unified interface.</p>
<p>Loading a dataset as a pandas dataframe is as simple as:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">matminer.datasets</span> <span class="kn">import</span> <span class="n">load_dataset</span>

<span class="n">df</span> <span class="o">=</span> <span class="n">load_dataset</span><span class="p">(</span><span class="s2">&quot;jarvis_dft_3d&quot;</span><span class="p">)</span>
</pre></div>
</div>
<p>Or use the dataset specific convenience loader to access operations common to that dataset:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">matminer.datasets.convenience_loaders</span> <span class="kn">import</span> <span class="n">load_jarvis_dft_3d</span>

<span class="n">df</span> <span class="o">=</span> <span class="n">load_jarvis_dft_3d</span><span class="p">(</span><span class="n">drop_nan_columns</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;bulk modulus&quot;</span><span class="p">])</span>
</pre></div>
</div>
<p>See <a class="reference internal" href="dataset_summary.html"><span class="doc">the dataset summary page</span></a> for a comprehensive summary of
datasets available within matminer. If you would like to contribute a dataset to matminer’s
repository see <a class="reference internal" href="dataset_addition_guide.html"><span class="doc">the dataset addition guide</span></a>.</p>
</div>
</div>
<div class="section" id="data-descriptor-tools">
<h2>Data descriptor tools<a class="headerlink" href="#data-descriptor-tools" title="Permalink to this headline">¶</a></h2>
<div class="section" id="decorate-the-dataframe-with-composition-structural-and-or-band-structure-descriptors-features">
<h3>Decorate the dataframe with <a class="reference internal" href="featurizer_summary.html"><span class="doc">composition, structural, and/or band structure descriptors/features</span></a><a class="headerlink" href="#decorate-the-dataframe-with-composition-structural-and-or-band-structure-descriptors-features" title="Permalink to this headline">¶</a></h3>
<p>We have developed utilities to help describe a material from its composition or structure, and represent them in number format such that they are readily usable as features.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
<a class="reference internal image-reference" href="_images/featurizer_diagram.png"><img alt="matminer featurizers" class="align-center" src="_images/featurizer_diagram.png" style="width: 1200px;" /></a>
<div class="line-block">
<div class="line"><br /></div>
<div class="line"><br /></div>
</div>
<p>For now, check out the examples below to see how to use the descriptor functionality, or tour our <a class="reference internal" href="featurizer_summary.html"><span class="doc">Table of Featurizers.</span></a></p>
</div>
</div>
<div class="section" id="plotting-tools">
<h2>Plotting tools<a class="headerlink" href="#plotting-tools" title="Permalink to this headline">¶</a></h2>
<div class="section" id="plot-data-from-either-arrays-or-dataframes-using-plotly-with-figrecipes">
<h3>Plot data from either arrays or dataframes using <a class="reference external" href="https://plot.ly/">Plotly</a> with figrecipes<a class="headerlink" href="#plot-data-from-either-arrays-or-dataframes-using-plotly-with-figrecipes" title="Permalink to this headline">¶</a></h3>
<p>In the figrecipes module of the matminer library, we have developed utilities that make it easier and faster to plot common figures with Plotly. The figrecipes module is aimed at making it easy for the user to create plots from their data using just a few lines of code, utilizing the wide and flexible functionality of Plotly, while at the same time sheilding the complexities involved.
Check out an example code and figure generated with figrecipes:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">matminer</span> <span class="kn">import</span> <span class="n">PlotlyFig</span>
<span class="kn">from</span> <span class="nn">matminer.datasets</span> <span class="kn">import</span> <span class="n">load_dataset</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">load_dataset</span><span class="p">(</span><span class="s2">&quot;elastic_tensor_2015&quot;</span><span class="p">)</span>
<span class="n">pf</span> <span class="o">=</span> <span class="n">PlotlyFig</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="n">y_title</span><span class="o">=</span><span class="s1">&#39;Bulk Modulus (GPa)&#39;</span><span class="p">,</span> <span class="n">x_title</span><span class="o">=</span><span class="s1">&#39;Shear Modulus (GPa)&#39;</span><span class="p">,</span> <span class="n">filename</span><span class="o">=</span><span class="s1">&#39;bulk_shear_moduli&#39;</span><span class="p">)</span>
<span class="n">pf</span><span class="o">.</span><span class="n">xy</span><span class="p">((</span><span class="s1">&#39;G_VRH&#39;</span><span class="p">,</span> <span class="s1">&#39;K_VRH&#39;</span><span class="p">),</span> <span class="n">labels</span><span class="o">=</span><span class="s1">&#39;material_id&#39;</span><span class="p">,</span> <span class="n">colors</span><span class="o">=</span><span class="s1">&#39;poisson_ratio&#39;</span><span class="p">,</span> <span class="n">colorscale</span><span class="o">=</span><span class="s1">&#39;Picnic&#39;</span><span class="p">,</span> <span class="n">limits</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;x&#39;</span><span class="p">:</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">300</span><span class="p">)})</span>
</pre></div>
</div>
<p>This code generates the following figure from the matminer elastic dataset dataframe.</p>
<iframe src="_static/bulk_shear_moduli.html" height="1000px" width=90%" align="center" frameBorder="0">Browser not compatible.</iframe><p>The Plotly module contains the <code class="code docutils literal notranslate"><span class="pre">PlotlyFig</span></code> class that wraps around Plotly’s Python API and follows its JSON schema. Check out the examples below to see how to use the plotting functionality!</p>
</div>
</div>
<div class="section" id="examples">
<h2>Examples<a class="headerlink" href="#examples" title="Permalink to this headline">¶</a></h2>
<p>Check out some examples of how to use matminer!</p>
<ol class="arabic simple">
<li>Use matminer and scikit-learn to create a model that predicts bulk modulus of materials. (<a class="reference external" href="https://nbviewer.jupyter.org/github/hackingmaterials/matminer_examples/blob/master/notebooks/machine-learning/intro_predicting_bulk_modulus.ipynb">Jupyter Notebook</a>)</li>
<li>Compare and plot experimentally band gaps from Citrine with computed values from the Materials Project (<a class="reference external" href="https://nbviewer.jupyter.org/github/hackingmaterials/matminer_examples/blob/master/notebooks/experiment_vs_computed_bandgap.ipynb">Jupyter Notebook</a>)</li>
<li>Compare and plot U-O bond lengths in various compounds from the MPDS (<a class="reference external" href="https://nbviewer.jupyter.org/github/hackingmaterials/matminer_examples/blob/master/notebooks/u-o_bondlength_analysis.ipynb">Jupyter Notebook</a>)</li>
<li>Retrieve data from various online materials repositories (<a class="reference external" href="https://nbviewer.jupyter.org/github/hackingmaterials/matminer_examples/blob/master/notebooks/data_retrieval_basics.ipynb">Jupyter Notebook</a>)</li>
<li>Basic Visualization using FigRecipes (<a class="reference external" href="https://nbviewer.jupyter.org/github/hackingmaterials/matminer_examples/blob/master/notebooks/visualization_with_figrecipes.ipynb">Jupyter Notebook</a>)</li>
<li>Advanced Visualization (<a class="reference external" href="https://nbviewer.jupyter.org/github/hackingmaterials/matminer_examples/blob/master/notebooks/advanced_visualization.ipynb">Jupyter Notebook</a>)</li>
<li>Running a kernel ridge regression model on vector descriptors (<a class="reference external" href="https://github.com/hackingmaterials/matminer_examples/blob/master/scripts/kernel_ridge_SCM_OFM.py">Python script</a>)</li>
<li>Many more examples! See the <a class="reference external" href="https://github.com/hackingmaterials/matminer_examples">matminer_examples</a> repo for details.</li>
</ol>
</div>
<div class="section" id="citing-matminer">
<h2>Citing matminer<a class="headerlink" href="#citing-matminer" title="Permalink to this headline">¶</a></h2>
<p>If you find matminer useful, please encourage its development by citing the following paper in your research</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>Ward, L., Dunn, A., Faghaninia, A., Zimmermann, N. E. R., Bajaj, S., Wang, Q.,
Montoya, J. H., Chen, J., Bystrom, K., Dylla, M., Chard, K., Asta, M., Persson,
K., Snyder, G. J., Foster, I., Jain, A., Matminer: An open source toolkit for
materials data mining. Comput. Mater. Sci. 152, 60-69 (2018).
</pre></div>
</div>
</div>
<div class="section" id="changelog">
<h2>Changelog<a class="headerlink" href="#changelog" title="Permalink to this headline">¶</a></h2>
<p>Check out our full changelog <a class="reference internal" href="changelog.html"><span class="doc">here.</span></a></p>
</div>
<div class="section" id="contributions-and-support">
<h2>Contributions and Support<a class="headerlink" href="#contributions-and-support" title="Permalink to this headline">¶</a></h2>
<p>Want to see something added or changed? Here’s a few ways you can!</p>
<ul class="simple">
<li>Help us improve the documentation. Tell us where you got ‘stuck’ and improve the install process for everyone.</li>
<li>Let us know about areas of the code that are difficult to understand or use.</li>
<li>Contribute code! Fork our <a class="reference external" href="https://github.com/hackingmaterials/matminer">Github repo</a> and make a pull request.</li>
</ul>
<p>Submit all questions and contact to the <a class="reference external" href="https://groups.google.com/forum/#!forum/matminer">Google group</a></p>
<p>A comprehensive guide to contributions can be found <a class="reference external" href="https://github.com/hackingmaterials/matminer/blob/master/CONTRIBUTING.md">here.</a></p>
<p>A full list of contributors can be found <a class="reference internal" href="contributors.html"><span class="doc">here.</span></a></p>
</div>
</div>
<div class="section" id="code-documentation">
<h1>Code documentation<a class="headerlink" href="#code-documentation" title="Permalink to this headline">¶</a></h1>
<p>Autogenerated code documentation below:</p>
<ul class="simple">
<li><a class="reference internal" href="genindex.html"><span class="std std-ref">Index</span></a></li>
<li><a class="reference internal" href="py-modindex.html"><span class="std std-ref">Module Index</span></a></li>
<li><a class="reference internal" href="search.html"><span class="std std-ref">Search Page</span></a></li>
</ul>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
  <h3><a href="#">Table of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">matminer</a><ul>
<li><a class="reference internal" href="#installing-matminer">Installing matminer</a></li>
<li><a class="reference internal" href="#overview">Overview</a></li>
<li><a class="reference internal" href="#data-retrieval-tools">Data retrieval tools</a><ul>
<li><a class="reference internal" href="#retrieve-data-from-the-biggest-materials-databases-such-as-the-materials-project-and-citrine-s-databases-in-a-pandas-dataframe-format">Retrieve data from the biggest materials databases, such as the Materials Project and Citrine’s databases, in a Pandas dataframe format</a></li>
<li><a class="reference internal" href="#access-ready-made-datasets-for-exploratory-analysis-benchmarking-and-testing-without-ever-leaving-the-python-interpreter">Access ready-made datasets for exploratory analysis, benchmarking, and testing without ever leaving the Python interpreter</a></li>
</ul>
</li>
<li><a class="reference internal" href="#data-descriptor-tools">Data descriptor tools</a><ul>
<li><a class="reference internal" href="#decorate-the-dataframe-with-composition-structural-and-or-band-structure-descriptors-features">Decorate the dataframe with <code class="docutils literal notranslate"><span class="pre">composition,</span> <span class="pre">structural,</span> <span class="pre">and/or</span> <span class="pre">band</span> <span class="pre">structure</span> <span class="pre">descriptors/features</span></code></a></li>
</ul>
</li>
<li><a class="reference internal" href="#plotting-tools">Plotting tools</a><ul>
<li><a class="reference internal" href="#plot-data-from-either-arrays-or-dataframes-using-plotly-with-figrecipes">Plot data from either arrays or dataframes using Plotly with figrecipes</a></li>
</ul>
</li>
<li><a class="reference internal" href="#examples">Examples</a></li>
<li><a class="reference internal" href="#citing-matminer">Citing matminer</a></li>
<li><a class="reference internal" href="#changelog">Changelog</a></li>
<li><a class="reference internal" href="#contributions-and-support">Contributions and Support</a></li>
</ul>
</li>
<li><a class="reference internal" href="#code-documentation">Code documentation</a></li>
</ul>

  <div role="note" aria-label="source link">
    <h3>This Page</h3>
    <ul class="this-page-menu">
      <li><a href="_sources/index.rst.txt"
            rel="nofollow">Show Source</a></li>
    </ul>
   </div>
<div id="searchbox" style="display: none" role="search">
  <h3>Quick search</h3>
    <div class="searchformwrapper">
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    </div>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="nav-item nav-item-0"><a href="#">matminer 0.5.4 documentation</a> &#187;</li> 
      </ul>
    </div>

    <div class="footer" role="contentinfo">
        &#169; Copyright 2015, Anubhav Jain.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.8.2.
    </div>

  </body>
</html>